Word sense disambiguation for arabic text categorization
نویسندگان
چکیده
In this paper, we present two contributions for Arabic Word Sense Disambiguation. In the first one, we propose to use both two external resources AWN and WN based on Term to Term Machine Translation System (MTS). The second contribution relates to the disambiguation strategies, it consists of choosing the nearest concept for the ambiguous terms, based on more relationships with different concepts in the same local context. To evaluate the accuracy of our proposed method, several experiments have been conducted using Feature Selection methods; Chi-Square and CHIR, and two Machine Learning techniques; the Naïve Bayesian (NB) and Support Vector Machine (SVM). The obtained results illustrate that using the proposed method increases greatly the performance of our Arabic Text Categorization System.
منابع مشابه
Natural Language Processing based Soft Computing Techniques
This paper presents the implementation of soft computing (SC) techniques in the field of natural language processing. An attempt is made to design and implement an automatic tagger that extract a free text and then tag it. The part of speech taggers (POS) is the process of categorization words based on their meaning, functions and types (noun, verb, adjective, etc). Two stages tagging system ba...
متن کاملThe Role of Word Sense Disambiguation in Automated Text Categorization
Automated Text Categorization has reached the levels of accuracy of human experts. Provided that enough training data is available, it is possible to learn accurate automatic classifiers by using Information Retrieval and Machine Learning Techniques. However, performance of this approach is damaged by the problems derived from language variation (specially polysemy and synonymy). We investigate...
متن کاملThe learning vector quantization algorithm applied to automatic text classification tasks
Automatic text classification is an important task for many natural language processing applications. This paper presents a neural approach to develop a text classifier based on the Learning Vector Quantization (LVQ) algorithm. The LVQ model is a classification method that uses a competitive supervised learning algorithm. The proposed method has been applied to two specific tasks: text categori...
متن کاملEmpirical Textual Mining to Protein Entities Recognition from PubMed Corpus
Wednesday, June 15th 8:00 Conference Registration (Registration desk) 8:45 Session 1: Large-Scale Online Linguistic Resources (I) Chair: "Text Categorization Based on Subtopic Clusters" Francis Chik, Robert Luk, Korris Chung "Automatic Filtering of Bilingual Corpora for Statistical Machine Translation" Shahram Khadivi, Hermann Ney "The Role of Word Sense Disambiguation in Automated Text Categor...
متن کاملImproving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
There is a rich flora of word space models that have proven their efficiency in many different applications including information retrieval (Dumais et al., 1988), word sense disambiguation (Schütze, 1993), various semantic knowledge tests (Lund et al., 1995; Karlgren and Sahlgren, 2001), and text categorization (Sahlgren and Karlgren, 2005). Based on the assumption that each model captures some...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. Arab J. Inf. Technol.
دوره 13 شماره
صفحات -
تاریخ انتشار 2016